<?xml version="1.0"?>
<atom:feed xmlns:atom="http://www.w3.org/2005/Atom" xmlns:html="http://www.w3.org/1999/xhtml">
  <atom:id>http://bill.welliver.org/atom/pike/Quick Hacks</atom:id>
  <atom:title type="text">electronic.alchemy :: Quick Hacks</atom:title>
  <atom:updated>2026-04-12T23:36:29-04:00</atom:updated>
  <atom:link href="http://bill.welliver.org/atom/pike/Quick Hacks" type="application/atom+xml"></atom:link>
  <atom:link href="http://bill.welliver.org/space/pike/Quick Hacks" type="text/html"></atom:link>
  <atom:link href="http://bill.welliver.org/rss/pike/Quick Hacks" type="application/rss+xml"></atom:link>
  <atom:generator uri="http://modules.gotpike.org/blahblah/Public.Syndication.ATOM" version="0.1">Public.Syndication.ATOM (Pike v8.0 release 702)</atom:generator>
  <atom:icon>http://bill.welliver.org/favicon.ico</atom:icon>
  <atom:logo>http://bill.welliver.org/static/images/alchemy.gif</atom:logo>
  <atom:subtitle type="xhtml"><html:div xmlns:html="http://www.w3.org/1999/xhtml"><html:h3 class="heading-1">Clean up trashed SQLite blobs</html:h3><html:p class="paragraph"/>
Sometimes, especially when converting data to sqlite, you'll find that fields in a record with binary data are marked internally as text data. What this means is that Pike will try to treat the data as UTF-8 and convert it to a pike string. Because binary data isn't always valid UTF-8. The fix is to re-store the offending data, making sure it's marked as binary, rather than text.<html:p class="paragraph"/>
The following snippet is an example of how to do this. In our example table, called object_Versions, the offending field is called "contents". Since some of the rows are fine (perhaps those records are storing real text), we only re-store the rows that fail. To find the failing rows, fetch each row individually.<html:p class="paragraph"/>
You can use the SQLite "CAST" operator to force the data marked as text to be retrieved as binary (BLOB) data. Then, you can put it back.<html:p class="paragraph"/>
<html:div class="code"><html:pre><html:pre>&#xD;
<html:b><html:font color="darkgreen">array </html:font></html:b><html:b><html:font color="darkbrown">ov</html:font></html:b> = s-&gt;query(<html:i><html:font color="darkred">"select id from object_versions"</html:font></html:i>);<html:p class="paragraph"/>
<html:b><html:font color="darkblue">foreach</html:font></html:b>(ov; ; <html:b><html:font color="darkgreen">mapping </html:font></html:b><html:b><html:font color="darkbrown">v</html:font></html:b>)&#xD;
{           &#xD;
  <html:b><html:font color="darkgreen">mixed </html:font></html:b><html:b><html:font color="darkbrown">err</html:font></html:b> = <html:b><html:font color="darkblue">catch</html:font></html:b>(s-&gt;query(<html:i><html:font color="darkred">"select contents from object_versions where id=:id"</html:font></html:i>, ([<html:i><html:font color="darkred">":id"</html:font></html:i>:(<html:b><html:font color="darkgreen">int</html:font></html:b><html:b><html:font color="darkbrown"/></html:b>)v[<html:i><html:font color="darkred">"object_versions.id"</html:font></html:i>]])));            &#xD;
  <html:b><html:font color="darkblue">if</html:font></html:b>(err) <html:font color="red">// ah, a row with a problem. let's fix it...&#xD;
</html:font>  {                                                                                                                             &#xD;
    werror(<html:i><html:font color="darkred">"failed to fetch id %d&amp;#110;"</html:font></html:i>, (<html:b><html:font color="darkgreen">int</html:font></html:b><html:b><html:font color="darkbrown"/></html:b>)v[<html:i><html:font color="darkred">"object_versions.id"</html:font></html:i>]);                                                                  &#xD;
    <html:b><html:font color="darkgreen">mixed </html:font></html:b><html:b><html:font color="darkbrown">t</html:font></html:b> = s-&gt;query(<html:i><html:font color="darkred">"select CAST(contents as blob) as v from object_versions where id=:id"</html:font></html:i>, ([<html:i><html:font color="darkred">":id"</html:font></html:i>:(<html:b><html:font color="darkgreen">int</html:font></html:b><html:b><html:font color="darkbrown"/></html:b>)v[<html:i><html:font color="darkred">"object_versions.id"</html:font></html:i>]]));&#xD;
    s-&gt;query(<html:i><html:font color="darkred">"update object_versions set contents=:contents where id=:id"</html:font></html:i>, ([<html:i><html:font color="darkred">":id"</html:font></html:i>:(<html:b><html:font color="darkgreen">int</html:font></html:b><html:b><html:font color="darkbrown"/></html:b>)v[<html:i><html:font color="darkred">"object_versions.id"</html:font></html:i> , <html:i><html:font color="darkred">":contents"</html:font></html:i>: t[0][<html:i><html:font color="darkred">"object_versions.v"</html:font></html:i>]]));&#xD;
  }&#xD;
}&#xD;
</html:pre></html:pre></html:div><html:p class="paragraph"/>
<html:h3 class="heading-1">Carrot2 Document Clustering</html:h3><html:p class="paragraph"/>
Interface with <html:a href="Carrot2" class="wiki_link_external">http://www.carrot2.org</html:a>'s Document Clustering Server via XMLRPC<html:p class="paragraph"/>
<html:div class="code"><html:pre><html:pre>&#xD;
<html:b><html:font color="darkgreen">object </html:font></html:b><html:b><html:font color="darkbrown">x</html:font></html:b> = Protocols.XMLRPC.Client(<html:i><html:font color="darkred">"http://localhost:8081/xmlrpc/processor"</html:font></html:i>);<html:p class="paragraph"/>
<html:font color="red">// input data is an array whose length is a multiple of 4.&#xD;
</html:font><html:font color="red">// each document input has 4 fields, so document n can be found at&#xD;
</html:font><html:font color="red">// inputdata[(4*n) .. (4*n) + 3]&#xD;
</html:font><html:font color="red">//&#xD;
</html:font><html:font color="red">// all 4 fields are required and are: &#xD;
</html:font><html:font color="red">// [0]id, [1]url, [2]title, [3]excerpt&#xD;
</html:font><html:font color="red">//&#xD;
</html:font><html:font color="red">// the sample data contains 1 input document.&#xD;
</html:font><html:b><html:font color="darkgreen">array </html:font></html:b><html:b><html:font color="darkbrown">inputdata</html:font></html:b> = ({<html:i><html:font color="darkred">"id0"</html:font></html:i>, <html:i><html:font color="darkred">"http://www.google.com"</html:font></html:i>, <html:i><html:font color="darkred">"google"</html:font></html:i>, <html:i><html:font color="darkred">"the google search engine"</html:font></html:i>});<html:p class="paragraph"/>
<html:b><html:font color="darkgreen">array </html:font></html:b><html:b><html:font color="darkbrown">clusters</html:font></html:b> = x[<html:i><html:font color="darkred">"cluster.doCluster"</html:font></html:i>](<html:i><html:font color="darkred">"test query"</html:font></html:i>, &#xD;
      ([<html:i><html:font color="darkred">"dcs.clusters.only"</html:font></html:i>:0]), ([]), inputdata)[0];<html:p class="paragraph"/>
  <html:b><html:font color="darkblue">foreach</html:font></html:b>(clusters;; <html:b><html:font color="darkgreen">mapping </html:font></html:b><html:b><html:font color="darkbrown">cl</html:font></html:b>)&#xD;
    write(<html:i><html:font color="darkred">"%s (%d)&amp;#110;"</html:font></html:i>, cl-&gt;label, sizeof(cl-&gt;documents));<html:p class="paragraph"/>
</html:pre></html:pre></html:div>
</html:div></atom:subtitle>
</atom:feed>
